NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

https://doi.org/10.1007/978-3-031-72670-5_12

Chatterjee, Agneet; Stan, Gabriela_Ben Melech; Aflalo, Estelle; Paul, Sayak; Ghosh, Dhruba; Gokhale, Tejas; Schmidt, Ludwig; Hajishirzi, Hannaneh; Lal, Vasudev; Baral, Chitta; et al (September 2024, Springer Nature Switzerland)

Full Text Available
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment

Ghosh, Dhruba; Hajishirzi, Hannaneh; Schmidt, Ludwig (December 2023, NeurIPS)

Full Text Available
Measuring and Narrowing the Compositionality Gap in Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.378

Press, Ofir; Zhang, Muru; Min, Sewon; Schmidt, Ludwig; Smith, Noah A; Lewis, Mike (December 2023, Findings of the Association for Computational Linguistics: EMNLP 2023)

Full Text Available
Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

Shi, Zhouxing; Carlini, Nicholas; Balashankar, Ananth; Schmidt, Ludwig; Hsieh, Cho-Jui; Beutel, Alex; Qin, Yao. (December 2023, Advances in neural information processing systems)

Full Text Available
DataComp-LM: In search of the next generation of training sets for language models

Li, Jeffrey; Fang, Alex; Smyrnis, Georgios; Ivgi, Maor; Jordan, Matt; Gadre, Samir; Bansal, Hritik; Guha, Etash; Keh, Sedrick; Arora, Kushal; et al (April 2025, https://doi.org/10.48550/arXiv.2406.11794)

The authors introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments aimed at improving language models. DCLM provides a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants can experiment with dataset curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline, the authors find that model-based filtering is critical for assembling a high-quality training set. Their resulting dataset, DCLM-Baseline, enables training a 7B parameter model from scratch to achieve 64% 5-shot accuracy on MMLU with 2.6T training tokens. This represents a 6.6 percentage point improvement over MAP-Neo (the previous state-of-the-art in open-data LMs), while using 40% less compute. The baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% and 66%), and performs similarly on an average of 53 NLU tasks, while using 6.6x less compute than Llama 3 8B. These findings emphasize the importance of dataset design for training LMs and establish a foundation for further research on data curation.
more » « less
Free, publicly-accessible full text available April 21, 2026
Homekit2020: A Benchmark for Time Series Classification on a Large Mobile Sensing Dataset with Laboratory Tested Ground Truth of Influenza Infections

Merrill, Mike A; Safranchik, Esteban; Kolbeinsson, Arinbjörn; Gade, Piyusha; Ramirez, Ernesto; Schmidt, Ludwig; Foschini, Luca; Althoff, Tim (January 2023, Conference on Health, Inference, and Learning (CHIL))

Despite increased interest in wearables as tools for detecting various health conditions, there are not as of yet any large public benchmarks for such mobile sensing data. The few datasets that are available do not contain data from more than dozens of individuals, do not contain high-resolution raw data or do not include dataloaders for easy integration into machine learning pipelines. Here, we present Homekit2020: the first large-scale public benchmark for time series classification of wearable sensor data. Our dataset contains over 14 million hours of minute-level multimodal Fitbit data, symptom reports, and ground-truth laboratory PCR influenza test results, along with an evaluation framework that mimics realistic model deployments and efficiently characterizes statistical uncertainty in model selection in the presence of extreme class imbalance. Furthermore, we implement and evaluate nine neural and non-neural time series classification models on our benchmark across 450 total training runs in order to establish state of the art performance.
more » « less
Full Text Available
Retiring Adult: New Datasets for Fair Machine Learning

Ding, Frances; Hardt, Moritz; Miller, John; Schmidt, Ludwig (January 2021, Advances in Neural Information Processing Systems 34 (NeurIPS 2021))

Full Text Available
Unlabeled data improves adversarial robustness

Carmon, Yair; Raghunathan, Aditi; Schmidt, Ludwig; Duchi, John (December 2019, Advances in neural information processing systems)

We demonstrate, theoretically and empirically, that adversarial robustness can significantly benefit from semisupervised learning. Theoretically, we revisit the simple Gaussian model of Schmidt et al. that shows a sample complexity gap between standard and robust classification. We prove that unlabeled data bridges this gap: a simple semisupervised learning procedure (self-training) achieves high robust accuracy using the same number of labels required for achieving high standard accuracy. Empirically, we augment CIFAR-10 with 500K unlabeled images sourced from 80 Million Tiny Images and use robust self-training to outperform state-of-the-art robust accuracies by over 5 points in (i) ℓ∞ robustness against several strong attacks via adversarial training and (ii) certified ℓ2 and ℓ∞ robustness via randomized smoothing. On SVHN, adding the dataset's own extra training set with the labels removed provides gains of 4 to 10 points, within 1 point of the gain from using the extra labels.
more » « less
Full Text Available
Model Reconstruction from Model Explanations

Milli, Smitha; Schmidt, Ludwig; Dragan, Anca; Hardt, Moritz (April 2019, In Proceedings of ACM FAT* 2019)

Full Text Available
Exploring the Landscape of Spatial Robustness

Engstrom, Logan; Tran, Brandon; Tsipras, Dimitris; Schmidt, Ludwig; Madry, Aleksander (June 2019, Proceedings of Machine Learning Research)

Full Text Available

« Prev Next »

Search for: All records